The Eeects of Query-based Sampling on Automatic Database Selection Algorithms Keywords: Distributed Collections, Merging Search Results/information Synthesis, Database Selection
ثبت نشده
چکیده
Database selection algorithms need to know the subject areas covered by each text database, but this metadata can be diicult to acquire in multi-party environments, such as the Internet, where each party has diierent interests and capabilities. Query-based sampling is a relatively new technique in which metadata is inferred by interacting with each text database and observing the outcomes. Query-based sampling has been proposed as a solution to the problem of discovering the contents of each database in multi-party environments, but its generality and eeectiveness had not been tested under a wide range of conditions. This paper investigates the generality and eeectiveness of query-based sampling with three well-known database selection algorithms (gGlOSS, CORI, CVV). Experimental results support the generality of query-based sampling as a solution for acquiring database descriptions in multi-party environments. The experiments also compare the eeectiveness of the database selection algorithms under diierent conditions.
منابع مشابه
Distributed Multisearch and Resource Selection for the TREC Million Query Track
A distributed information retrieval system with resource‐selection and result‐set merging capability was used to search subsets of the GOV2 document corpus for the 2008 TREC Million Query Track. The GOV2 collection was partitioned into host‐name subcollections and distributed to multiple remote machines. The Multisearch demonstration application restricted each search to a fraction of the avail...
متن کاملSeparating indexes from data: a distributed scheme for secure database outsourcing
Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...
متن کاملUtilizing Context in Ranking Results from Distributed CBIR
Selection and ranking of relevant images from image collections remains a problem in content-based image retrieval. This problem becomes even more visible and acute when attempting to merge and rank multiple result sets retrieved from a distributed database environment. This paper presents findings from a project that investigated if combining text and image retrieval algorithms with the use of...
متن کاملAnalysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)
Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis. Methods: The method of this research is log anal...
متن کاملCollection Profiling for Collection Fusion in Distributed Information Retrieval Systems
Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalizing scores based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database and do not conside...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000